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1 Introduction 

Market indices are widely used to track the performance 
of stocks or to design investment portfolios (l). This pa- 
per initiates a rigorous mathematical study of the com- 
putational complexity of the art of designing proxies for 
such indices. While there are several results on selecting 
such proxies (or portfolios) in an on-line manner (see, 
for example, [2] and [3]), we look at off-line algorithms 
for designing proxies based on historical data. In par- 
ticular, we show that all combinations of three funda- 
mental problems (such as tracking or outperforming a 
full market index) with four commonly-used indices give 
NP-complete problems, so are computationally hard. 

To formally define market indices, let B be a set of b 
stocks in a market. Let S^t > 0 be the price of the i-th 
stock at time t. Let Wi be the number of outstanding 
shares of the t-th stock. We assume that Wi does not 
change with time. This paper discusses computational 
complexity issues regarding four kinds of market indices 
currently in use [1], These indices are calculated by the 
following formulas, which can be multiplied by arbitrary 
constants to arrive at desired starting index values at 
time 0. 

• The price-weighted index of B at time t is 

*»(*,«) = £*f^. 

The Dow Jones Industrial Average is calculated in this 
manner for some B consisting of thirty stocks. 

• The value-weighted index of B at time t is 
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The Standard k Poor's 500 is computed in this way 
with respect to 500 stocks. 

• The equal-weighted index of B at time t is 

*3<B, 

The index published by the Indicator Digest is calcu- 
lated by this method, involving stocks listed on the New 
York Stock Exchange. 

• The price-relative index of B at time t is 

The Value Line Index is computed by this formula. 

There are numerous reasons why stock investors and 
money managers would want to invest in a subset of 
stocks rather than those of a whole market [1]. For 
instance, small investors certainly do not have sufficient 
capital to invest in every stock in the market. Logically, 
such investors would attempt to choose a small subset 
of stocks which hopefully can perform roughly as well as 
or even outperform the market as a whole. They then 
face difficult trade-offs between returns and risks. For 
these and other reasons of optimization, we formulate 
three natural computational problems for the design of 
market indices. Given a market M consisting of m 
stocks, we wish to choose a subset Mk of at most k 
stocks and calculate an index of which is called a 
k -proxy of the corresponding index of the whole market 
M (we sometimes refer to Mk as a portfolio). Our goal 
is to choose Mk so that the resulting fc-proxy tracks or 
outperforms the corresponding index of M. This paper 
shows that designing proxies for the above four indices 
based on historical data is computationally hard. 
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2 Problem Formulations 

In this section we formally define three basic problems 
related to selecting ^-proxies, or portfolios. 

PROBLEM 1. (tracking an index) 

Input: A market M of m stocks, their prices 
St,( > 0 for t = 0, . . . , /, their numbers Wi of 
outstanding shares, a real t\ > 0, an integer k > 0. 
and some j € {1,2,3,4} to indicate the desired type 
of index. 

Output: A subset Aijt of at most k stocks in M 
such that for all t = 1, . . . , /. 



*;(-M fcl 0) 



*i(-M,0) 



PROBLEM 2. (outperforming an index) 

Input: A market M of m stocks, their prices 
Si,t > 0 for t = 0,...,/, their numbers Wi of 
outstanding shares, a real €2 > 0, an integer fc > 0, 
and some j G {1 , 2, 3, 4} to indicate the desired type 
of index. 

Output: A subset .M* of at most k stocks in M 
such that for all t = 1, . . . , /, 

*j{M k ,t) >n .,, *iiM,t) 

For the final problem, we need a few extra defini- 
tions in order to analyze the volatility of a set of stocks. 
Let B be a set of stocks as defined in §1. 

• The one-period return of $ 5 for B at time t > 1 is 



j?,(B,t) = ln 



• The average return of $y for B up to time t > 1 



is 



RAB,t) = 



• The voiafzJity of for B up to time t > 2 is 



1 (*,(g,i)-flj(B,t))' i 



PROBLEM 3. (sacri/icin^ return for less volatility) 

Input: A market M of m stocks, their prices 
> 0 for t = 0, . . . , /, their numbers Wi of 
outstanding shares, two reals ct,/? > 0, an integer 
£ > 0, and some j e {1,2,3,4} to indicate the 
desired type of index. 



Output: A subset Mk of at most k stocks in M 
such that 

l4^i> G .|4^ forallt=1 /; 

*j(-Mft,0) *j(M.0) 
Aj(-M*: 5) < /?-Aj(-M s 5) for all s = 2, . . . ? /: 
3 Results 

In this abstract, we simply quote the main results — all 
the proofs can be found in the full paper. 

THEOREM 3.1. Let €\ be any error bound satisfying 
0 < €i < 1 and specified using n°W bits in fixed point 
notation. Then the tracking problem with error bound ej 
is NP-hard for the price-weighted index, value-weighted 
index, and equal-weighted index. 

THEOREM 3.2. Let € 2 be any value satisfying 0 < 
€2 < n c for some constant c. Then the problem of 
outperforming the market average with bound €2 is NP- 
hard for the price-weighted index, value-weighted index, 
equal-weighted index, and price-relative index. 

THEOREM 3.3. Let a and /3 be values expressed using 
bits in fixed-point binary notation, and satisfying 
0 < a < n 0 < x > and 0 = H (jff£) . Then the problem of 
sacrificing return for less volatility is NP- complete for 
the price-weighted index, value-weighted index, equal- 
weighted index, and price-relative index. 

The one result that must be separated from the 
above is the tracking problem for the price-relative 
index. In order for our reduction to work in this case, 
we were required to reduce the range of possible values 
for €1. 

THEOREM 3.4. Let e x be any error bound satisfying 
0 < ei < 1 and specified using O(logn) bits in fixed 
point notation. Then the tracking problem for a price- 
relative index with error bound C\ is NP-hard. 
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Dividing sentences in chunks of words is a useful preprocessing step for parsing, 
information extraction ^pd information retrieval. (Ramshaw and Marcus, 1995) have 
introduced a "convenient" data representation for chunking by converting it to a tagging 
task. In this paper we will examine seven different data representations for the problem of 
recognizing noun phrase chunks. We will show that the the data representation choice has 
a minor influence on chunking performance. However, equipped with ... 
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Trigrams'nTags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims 
found elsewhere in the literature, we argue that a tagger based on Markov models 
performs at least as well as other current approaches, including the Maximum Entropy 
framework. A recent comparison has even shown that TnT performs significantly better for 
the tested corpora. We describe the basic model of TnT, the techniques used for smoothing 
and for handling unknown words. Furthermore, we present evaluat ... 
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to suggest the use of a complementary "backup" method to increase the robustness of any 
hand-crafted or machine-learning-based NE tagger; and second, to explore the 
effectiveness of using more fine-grained evidence— namely, syntactic and semantic 
contextual knowledge— in classifying NEs. 
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Automatic acquisition of lexical knowledge is critical to a wide range of natural language 
processing tasks. Especially important is knowledge about verbs, which are the primary 
source of relational information in a sentence— the predicate-argument structure that 
relates an action or state to its participants (i.e., who did what to whom). In this work, we 
report on supervised learning experiments to automatically classify three major types of 
English verbs, based on their argument structure— sp ... 
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It is widely accepted that tagging text with semantic information would improve the quality 
of lexical learning in corpus-based NLP methods. However available on-line taxonomies are 
rather entangled and introduce an unnecessary level of ambiguity. The noise produced by 
the redundant number of tags often overrides the advantage of semantic tagging. In this 
paper we propose an automatic method to select from WordNet a subset of domain- 
appropriate categories that effectively reduce the overambiguit ... 
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Information Retrieval (IR) is an important application area of Natural Language Processing 
(NLP) where one encounters the genuine challenge of processing large quantities of 
unrestricted natural language text. While much effort has been made to apply NLP 
techniques to IR, very few NLP techniques have been evaluated on a document collection 
larger than several megabytes. Many NLP techniques are simply not efficient enough, and 
not robust enough, to handle a large amount of text. This paper propos ... 
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extraction task for which corpora in several languages are available. Using the results of 
the statistical analysis, we propose an algorithm for lower bound estimation for Named 
Entity corpora and discuss the significance of the cross-lingual comparisons provided by the 
analysis. 
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Recent proposals to apply data mining systems to problems in law enforcement, national 
security, and fraud detection have attracted both media attention and technical critiques of 
their expected accuracy and impact on privacy. Unfortunately, the majority of technical 
critiques have been based on simplistic assumptions about data, classifiers, inference 
procedures, and the overall architecture of such systems. We consider these critiques in 
detail, and we construct a simulation model that more cl ... 
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While reading The Wall Street Journal this morning, I found myself thinking about the flood 
of information I get electronically. In many ways, reading the newspaper is still a more 
pleasant experience than sorting through (virtual) piles of e-mail, newsgroups, online 
forums, Web sites, and other Internet information sources. Can more of the lessons of 
newspaper design be applied to online information tools? 
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Economist Ronald Coase was not suggesting that because the size of firms is tied to 
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The fastest growing market segment of new computer users is senior citizens. For many 
seniors, the computer is a puzzling device whose inner workings will forever remain a 
mystery. A lack of understanding of how a computer works doesn't necessarily mean that 
seniors' interest in using a computer is diminished. But it does mean that most of the 
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documentation written to help them learn computing skills and use common applications is 
inappropriate. Based on a series of free seminars conducted in th ... 
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concept definitions. In case of large and complex application domains this task can be 
lengthy, costly, and controversial (since different persons may have different points of view 
about the same concept). To reduce time, cost (an ... 
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People, no matter how rational they are, usually act on the basis of incomplete information. 
If they are rational they recognize their own ignorance and reflect carefully on what they 
know and what they do not know, before choosing how to act. Furthermore, when rational 
agents interact, they also think about what the others know, and what the others know 
about what they know, before choosing how to act. Failing to do so can be disastrous. When 
the notorious evil genius Professor Moriarty conf ... 

51 HQ||and.cjassifie 
Andreas Geyer-Schulz 

June 1995 ACM SIGAPL APL Quote Quad , Proceedings of the international conference 
on Applied programming languages, volume 25 issue 4 

Full text available: f§ odff 1 23 MB) Additlonal ,nformation: MsMtan, ab^act, mtoQ^., citing, index 
^ ' % .terms 

A Holland classifier system is an adaptive, general purpose machine learning system which 
is designed to operate in noisy environments with infrequent and often incomplete feedback. 
Examples of such environments are financial markets, stock management systems, or 
chemical processes. In financial markets, a Holland classifier system would develop trading 
strategies, in a stock management system order heuristics, and in a chemical plant it would 
perform process control. In this paper we descr ... 

Keywords: bucket brigade, classifier system, genetic algorithm, machine learning, 
triggered operations 



52 The department of defense contractor investment policy model ggg 
Richard William Barker, Kenneth C. Konwin 

December 1983 Proceedings of the 15th conference on Winter Simulation - Volume 2 

Full text available: ^pdf( 303,08.KB} Additional Information: Mi .citation, abstract, references, indexjerrns 

The primary aim of this system dynamics study was to understand aerospace defense 
contractor investment behavior, and evaluate government policies designed to motivate 
investment in capital within the current fighter/attack aircraft manufacturing market 
structure. The approach taken analyzes investment projects and capital equipment 
expenditure flows that arise as a result of the overall decision framework of the firm. The 
continuous corporate computer simulation model addresses several of ... 

53 Session 9C: evolution, adaptation, and learning II; Learning and exploiting context in H 
agents 

Bruce Edmonds 

July 2002 Proceedings of the first international joint conference on Autonomous 
agents and multiagent systems: part 3 

Fu 1 1 text a va i la b le : *[|| p.cif( 30 8, 02 . i<B) Ad d itio n a 1 1 n fo rmati o n : ML£JMU> D. . s !?st [•& ot ., ref s ren ce s , j ind i ex te rvr$ 

The use of context can considerably facilitate reasoning by restricting the beliefs reasoned 
upon to those relevant and providing extra information specific to the context. Despite the 
use and formalization of context being extensively studied both in AI and ML, context has 
not been much utilized in agents. This may be because many agents are only applied in a 
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single context, and so these aspects are implicit in their design, or it may be that the need 
to explicitly encode information about vari ... 

Keywords: biological analogy, cognitive analogy, context, deduction, evolutionary 
computation, genetic programming, integration, learning 



54 Search. 2;. Ev^ 

Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk 

May 2002 Proceedings of the eleventh international conference on World Wide Web 

Full text available: m^fim54. Additional ,nformation: MlQMm, shslnsL &bn®saz> citings, index 



•erm$ 



Finding pages on the Web that are similar to a query page (Related Pages) is an important 
component of modern search engines. A variety of strategies have been proposed for 
answering Related Pages queries, but comparative evaluation by user studies is expensive, 
especially when large strategy spaces must be searched (e.g., when tuning parameters). We 
present a technique for automatically evaluating strategies using Web hierarchies, such as 
Open Directory, in place of user feedback. We apply this ... 

Keywords: evaluation, open directory project, related pages, search, similarity search 



55 How.open.data.network^ 

Lynn A. Streeter, Robert E. Kraut, Henry C. Lucas, Laurence Caby 
July 1996 Communications of the ACM, volume 39 issue 7 

Full text available: ^.odfiZiSMB) Additional information: M cMiffl, jUtfaffifiH. citings, jnc^x Mom 



56 Ihe.Ki!ler aps:.how 
Ian Clark 

June 2000 ACM SIGAPL APL Quote Quad , Proceedings of the international conference 

on APL-Berlin-2000 conference, volume 30 issue 4 
Full text available: ^^:^(M§^i.KB • Additional Information: M-SiMion, SLbstract, in#xlerm$ 

I've been a programmer for over 30 years, often at the leading edge, never in a classic IT 
shop. I've worked with several vendors' mainframes, midis and micros, for big firms, small 
firms, central government, educational establishments, and for myself; in colleges, 
universities, laboratories and classrooms; in England, in Europe, and in the USA. I've been 
involved in some total flops, but in one or two real successes too. I could never tell from the 
outset which it was going to be. The more I se ... 

57 Managing. Your Mon^ with GnuCash 
Robert Merkel 

April 2001 Linux Journal 

Full text available: i^.t;tml(10SJi&}. Additional Information: fgjLgjMlQ.0., abstract, ind.ex .terms 
A tutorial on a powerful, free accounting program. 

58 Synim^ric.and. Asymmetric Encry^ion 
Gustavus J. Simmons 

December 1979 ACM Computing Surveys (CSUR), Volume 11 Issue 4 

Full text available: ^.&#{&23.M8i Additional Information: foJLsitation, r^terences., citing.s, indexlerrns 
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59 Aggregate.^ 

Robert Ross, V. S. Subrahmanian, John Grant 

January 2005 Journal of the ACM (JACM), volume 52 issue l 

Full text available: ^o^(8;;6J£J<B} Additional Information: ML^tation, abstract, references, index.ter-^s 

Though extensions to the relational data model have been proposed in order to handle 
probabilistic information, there has been very little work to date on handling aggregate 
operators in such databases. In this article, we present a very general notion of an 
aggregate operator and show how classical aggregation operators (such as COUNT, SUM, 
etc.) as well as statistical operators (such as percentiles, variance, etc.) are special cases of 
this general definition. We devise a formal linear program ... 

Keywords: Aggregates, probabilistic relational databases 



60 Mtomatic^ 
of.corpora 

Alessandro Cucchiarelli, Paola Velardi 

March 1997 Proceedings of the fifth conference on Applied natural language 
processing 

Full text available: 



.p.afiZQfi.88. KB), 

Additional Information: fuil.c:tatjon, abstract, references, citings 

Publisher Site 



It is widely accepted that tagging text with semantic information would improve the quality 
of lexical learning in corpus-based NLP methods. However available on-line taxonomies are 
rather entangled and introduce an unnecessary level of ambiguity. The noise produced by 
the redundant number of tags often overrides the advantage of semantic tagging. In this 
paper we propose an automatic method to select from WordNet a subset of domain- 
appropriate categories that effectively reduce the overambiguit ... 
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